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Supercomputing Path 


Enterprise Path 


Cloud Path 
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HPC Infrastructure 
Best Practices 


— 


Complexity: “Composing a working HPC 
environment is difficult, time-consuming, 
requiring experts.” 
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Time to Solution: "I need to maximize 
application performance, scale workloads 
and minimize overhead.” 


> : T Å 0 
NNN, ~ = Å 
MAP RS 


UA 


mn‘: 





A or d Ll ve > ai PG de /, Sy ni 


Maintenance: “My IT staff doesn't have time f 


to update and test all the different software 
components.” 
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1t Hyperion Researc 


HPC ROI is very high - $507 
revenue per dollar; $47 
average profit (or cost 
savings) per dollar invested in 
HPC! 


Worldwide HPC revenue 
expected to reach over $19 
billion by 2024! (CAGR 6.8%, 
down from 8.7% due to Covid 
impacts) 


The exascale race will drive 
new technologies 


Big data combined with HPC 
creating new solutions, 
adding many new buyers to 
the HPC space 





1, November 2020 








Growing cloud-based HPC 
services with demand for 
faster answers to complex 
problems 


Storage systems wil! 
increasingly become more 
critical 


Cloud computing for HPC 
workloads will grow faster 


Al/ML will grow faster than 
everything else 
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Practical HPC 


Data Science to Business Results 


Data 


Science & Al/ML/DL 
Modeling 


Data 





Data Analytics 
Inference and Practical Al 


Predictive 
Modeling 
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Supercomputing 


IHV-driven (HPE/Cray/SGI, Fujitsu, Dell, Bull) 


International research labs (NCAR/NOAA, 
Kongsberg, ICHEC) 


Academia (U. of Bristol, U. of Leicester) 


High ROI 











Commercial OS Share in Top500 


(represents 107 supercomputers in the list) 


supercomputer HPC Market 


SUSE/CLE 41% 








Ubuntu 10%9 BG EEEE EFFEEF, 


okei GS SNOIN TOPS A 


(represents 107 supercomputers A 


majority of 41% Bullx 17% HH HH HH HH 
The rest are running either 
unspecified “Linux”, TOSS or 
CentOS? 
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Commercial OS Share in Top 200 


(represents top 200 supercomputers in the list) 


supercomputer HPC Market 


SUSE/CLE 19% 








Other non-commercial OS 
include CentOS (19%), "Linux" 
(35%), TOSS (3%), VEOS (1%) and 
others (Sunway, Kylin, Scientific) 


CentOS & TOSS gain share in 
“smaller” supercomputers 
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Rise of Arm-based 
Supercomputing 


NVIDIA announced its first data center CPU, 
an Arm-based processor that will deliver 
10x the performance of today’s fastest 
servers on the most complex Al and HPC 
workloads 


Catalyst UK established largest Arm-based 
supercomputer deployments in the world — 
more than 12,000 Arm-based cores running 
across three universities with strong 
benchmark results 


N envio. 


å o 
UNIVERSITY OI EDINBY 
LEICESTER L 1 


PAKS University of Hewlett Packard 
niversl y O 5 
DES BRISTOL Enterprise 


SUSECON digital“ 





sl | | me 
s ER se | "eg | 
LS EV l, qas, 3 X Ry 
z P dues _ sa 


vredvrdsesererr am Teese ts eseers PØSTTTSTTLT TET: 
I ae ee auannnnans 
«å 


= = = a, = 
kan ien hon hoe) 
- ART) 


- ee RC E 
or dr Seige dude): 
EV was. OT UR AG VERN 
| CALATA ATALA 
(sere I VE UG mie 
ATT TET OR 
e stå KETER PG: Å dd Li 
h = gic E © 
TELE HATT 
. qe ge dE DL rs 
TET HETE TET 
‘ "74 EEE 
TET EE TE TE NR 

i 4 ; ry Sis 
Gea ea cau: 





naanin T 
i hi 

















Exascale race driving new 
technologies 


Unprecedented set of tools to address big, 
complex problems 


- New approaches for drug response 
prediction 


e Global to micro weather forecasting 
e Extreme-scale cosmological simulations 


e Discovering materials for the creation of 
more efficient organic solar cells 
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Leibniz 
Supercomputing 
Centre 


SuperMUC-NG runs SUSE on Lenovo ThinkSystem, 
6500 nodes, 26.9 PetaFlops 


Geophysicists use earthquake 
simulation software to investigate 
seismic waves beneath Earth's surface 


Calculations involved in this kind of simulation are 
so complex that they push even supercomputers 
to their limits 


Better prepare for future seismic events 
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BERIT 


Tokyo Institute of Technology 


Tokyo Institute of 
Technology 


TSUBAME touted as the “supercomputer for 
everyone” 
Medicine and simulating human organs 
Predicting traffic congestion or share prices 
Earthquake warnings and weather forecasting 


Social phenomenon analysis 
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NASA Asteroid 
Threat Assessment 
Project 


Pleiades supercomputer @ Ames 


NASA's asteroid research is shared with 
scientists at universities, national labs, and 
government agencies 


Develop assessment and response plans to 
look at damage to infrastructure, warning 
times, evacuations, and other options for 
protecting lives and property 
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Tangible benefits are being realized 





HPC Infrastructure Is Being Driven 


Into Enterprise Markets! 


Competitive forces are driving 
companies to aim more complex 
questions at their data structures & push 
business operations closer to real time 


HPC moving in-house for scalability, 
ultrafast data movement and very large 
memory systems. 


Manufacturing is the largest commercial 
segment (incl. consumer goods) 


Financial Services is the second largest 
commercial segment 
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Commercial 


08% 


e Financial services 

e Large product manufacturing 
e Bio-sciences 
e Energy 

e Consumer product 


manufacturing 


e Retail 








Academic 


17% 


e Academic 
e Not-for-profit 











Government 


O 

25% 

e National agency 

e National security 

e National research lab 


e State or local 
government 


e Chemical 
e Media & Entertainment 
e Electronics 


1 Intersect360 Research, August 2020 


e Transportation 
e Other commercial 





Household 
Appliance Design 


Reduce time in the development of 
innovative ovens, washing machines and 
refrigerators 


simulate and test more product variations 
in a virtual design space before committing 
to a physical prototype 


Achieve higher sales, reducing 
manufacturing costs and improving the 
brand. 
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Manufacturing & — —f- nn 
Materials 
Science 


Discovery: Discovers materials faster; mine 
databases for “recipes” 


Analysis: Predicts right compound 
combinations 


Modeling: Helps refine materials for 
optimum performance 
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Cloud 


Makes HPC resources available to 
scientists around the world 


HPC Bursting for on-demand processing 


CrunchYard, Azure, AWS, GCP partnerships 
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HPC in the Cloud Market 


HPC cloud spend projected to reach 
-$9B by 2024! 


Covid-19 has accelerated cloud 
adoption 


HPC in the cloud is expected to grow 
more than 2.5 times faster than the 
on-prem HPC server market 


Major drivers include increased 
number of Al/ML workloads in the 
cloud and improvements in ease of 
use and deployment of HPC jobs 





1 Hyperion Researc 
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8,000 


6,000 


4,000 


($M) HPC Cloud Spend 


1, November 2020 





2000 


2020 HPC Cloud Forecasti 
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All-in vs. Bursting 


Local Network Cloud 


| Compute Nodes 
Cluster 


Head Node 
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HPC “all-in” the cloud 
Includes the head, compute and storage nodes, 
with no hardware infrastructure to maintain 
Optimized cost and performance for scale-out 
applications 
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Local Network Cloud 





HPC bursting to hybrid/public clouds 
Address changing capacity needs 
Extend HPC jobs to the Cloud for on-demand 
scale and flexibility 
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Hyperloop 


Magnetic accelerators and compressed air 
bearings remove frictional forces 


Design anticipates operational 
maintenance and variables such as 
earthquakes, power outages and 
passenger fluctuations 


Design must be safe, reliable, affordable 
and self-powered 


HPC simulations enable iterative design 
changes and show results 
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Pharmaceuticals 
& Drug Research 


Experimentation: Predicts treatment results 
accurately 


Discovery: Improves drug design and 
discovery 


Treatment: Enables better disease 
management; enables precision medicine 
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Containers 


Kubernetes lacks high-performance 
scheduling capabilities 


HPC Container platforms (Rancher, 
Singularity, UberCloud) 


Data Scientists rely on containers across 
environments 

















What Makes a Good 
Container Solution 
for HPC? 


e Compatible with HPC environments 


- Designed for performance (support 
for accelerated GPUs) 


e Works with many resource 
managers (Slurm, Altair, Bright 
Computing, Univa) 


e GUI support for managing jobs and 
services 


e Secure and compliant 


(sp 
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Simplify access to 
accelerator technology 
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OpenACC is a directive-based 
programming model designed to 
provide performance and portability 
for CPUs, GPUs and other accelerators 
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RANCHER illumina’ 


Leading Producer 
of DNA Sequencers 


Market leader in genetic sequencing, llluminar's 
research drives work on disease, drug reaction 
and agriculture 


Provided a migration path for a system with 25K 
HPC computing cores and 20 petabytes of 
storage 


Traditional HPC distributes compute over many 
nodes, smaller number of large files with data 
transfer between multiple nodes 


Open source container platforms are designed to 
be simple, fast, and secure — optimized for HPC 
workloads 
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full capability of a Rubernetes cluster 


K3ai infrastructure for edge dev 


Al Infrastructure 
for Edge 


e K3ai ( Visa 
lightweight, fully automated Al infrastructure-in- 
a-box solution that allows anyone to 
experiment quickly with Kubeflow pipelines 


e Infrastructure for edge devices with full 
capability of a Kubernetes cluster 


e Includes Kubernetes based on K3s from 
Rancher, Kubeflow pipelines, NVIDIA GPU 
support, NVIDIA Triton inference server and more 


e Runs on Arm and x86 
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Sustainable 
Energy & Utilities 


Diversity and increase pace of innovation 
while improving infrastructure efficiencies 


Increase responsiveness by aligning 
product supply and achieve safe and 
compliant operations 


Modeling reduces cost and risk 


Limit environmental impacts; optimizes 
supply/demand; proactive maintenance 


Enables smart allocation of energy 
resources 
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Automotive 
Design & 
Manufacturing 


Design: Enables effective design 
simulations 


Connected vehicles: Powers advanced 
safety features; cloud services for available 
data; driver monitoring 


Manufacturing: Soeeds up design, 
increases quality and reduces R&D costs 
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SUSE Linux Enterprise 
High Performance 
Computing 


Bundles popular HPC tools and libraries 
All packages supported by SUSE 
Available for x86-64 and Arm64 


Flexible release schedule 








SUSE HPC Module 


Base OS & 


Acta tura SLE HPC 15 (HPC Module w/subscription) supports Aarch64 & x86-64 


SLURM (a In scalable workload manager 


conman (the console manager 

genders na ene cluster configuration database) 
lua-Imod (environment module system) 

munge (authentication service for user credentials) 
mrsh (set of remote shell programs using munge) 
odsh (high performance, parallel remote shell utility) 
orun (seript-based wrapper for launching parallel jobs) 


Management 
Tools 


clustduct (script which glues the ae database to dnsmasa) 


powerman (cluster power control 
cpuid (obtain CPU details) 


I/O Memkind (heap manager for memory control) 
Services RASDaemon (RAS reports via kernel tracing) 


boost (portable C++ source library) 
as! (GNU Scientific Library) 
FFTWS (Fourier transforms computing library) 


Parallel 


rd ScaLAPACK (scalable linear algebra package) 


nypre (parallel solvers for sparse linear systems) 


mumps (multifrontal massively parallel sparse direct solver) 
Scotch (graph & mesh/hypergraph partitioning, graph clusterin 
trilinos (large-scale complex multi-physics & scientific problems 


Serial OpenBLAS (optimized BLAS library) 
Libraries Superlu (super nodal sparse direct solver) 


Compilers GCC (GNU Compiler Collection includes C++ & Fortran) 
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PETSc (suite of data structures for partial differentiated equations) 


Monitoring 
Tools 


i/o 
Libraries 


Message 
Passing 
Interface 


Development 
Tools 


Performance 
Tools 


SUSE 
Package 
Hub 





ganglia (ganglia monitoring core) 
ganglia-web (ganglia web front-end) 


icinga2 (monitoring platform core) 
prometheus slurm exporter (Prometheus exporter for perf metrics) 





adios (adaptable I/O system for exascale) 

hdf5 (data model, library and file format for managing data) 
netcdf (Unidata network Common Data Form) 

netcdf-cxx4 (C++ libraries and utilities) 

netcdf-fortran (netCDF Fortran libraries) 

pnetcaf (parallel I/O library for NetCDF file access) 


openmpi3/openmpi4 (Message Passing Interface implementation) 
mvapich2 (MPI over InfiniBand, Omni-Path, ROCE, iWARP) 

mpich (HP portable implementation of MPI) 

openPMIx (Process Management Interface Exascale standard) 


metis (serial graph partitioning & matrix ordering) 
hwloc (hardware locality) 


thon-numpy (scientific computing with Python) 
python-scipy Cree for math, science and engineering) 


mpiP (lightweight MPI profiler) 
imb (Intel MPI benchmarks) 
papi (Performance Application Programming Interface) 


robinhood (policy engine to monitor filesystem contents) 
singularity (HPC application containers) 

charliecloud (lightweight user-defined HPC stack) 

clustershell (scalable cluster admin Python framework) 

warewulf (scalable systems management suite for HP clusters) 








Ref Architectures 


Base OS 


Architecture 
Administrative Tools 


Provisioning Tools 
Resource Management 
Runtimes 


I/O Services 
Parallel Libraries 


Serial/Threaded Libraries 
I/O Libraries 
Compiler Families 


MPI Runtime/ Transport Families 


Development Tools 


Performance Tools 


Lustre Packages 
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(, OpenHPC 


open Susie Sepp ls, CEnos 8,5 
Aarch64, x86-64 


conman, docs, examples, genders, Imod-defaults, Imod, losf, 
mrsh, nhc, pdsh, prun, test-suite 


warewulf 


SLURM, OpenPBS, pmix, magpie 
charliecloud, singularity 





Memkind, RASDaemon 


boost, gsl, FFTW, hypre, mfem, mumps, opencoarrays, petsc, 
scotch, scalapack, slepc, superlu_dist, trilinos 





R, openblas, plasma, superlu 


adios, HDF5, NetCDF, NetCDF-cxx, NetCDF-F, pnetcdf, sionlib 


GNU (GCC, G++, GFORTRAN), Intel/Arm compiler compatibility 
Intel MPI Compat libfabric, mpich, mvapich2, openmpi4, UCX 


EasyBuild, autoconf, automake, cmake, hwloc, metis, libtool, 
python3-mpi4py, python3-numpy, -scipy, spact, valgrind 


dimemas, extrae, geopm, imb, likwid, msr-safe, omb, papi, 
pdtoolkit, scalasca, score-p, tau 


lustre-client (Whamcloud) 











om SUSE 


SLE HEC 15 
Aarch64, x86-64 


comen, Jende Ua Mod pasm Prun GENI MSN cepte, 
powerman 


SLE HPC 


SLURM, Munge, (Altair PBS available separately) 





OCR (Singularity & Charliecloud available via Package Hub) 


Memkind, RASDaemon 





GSL FFTW, Hypre, MUMPS, PETSc, ScaLAPACK, Scotch, Trilinos 














OpenBLAS, SuperLU 
HDF5 (DØDES) NetCDF, NetCDF-cxx, NetCDF-FORTRAN 


GNU (GCC, G++, GFORTRAN) 
mvapich2, openMPI3, openMPI4, mpich, OpenPMIx 








Hwloc, python8-NumPy, python3-SciPy, Metis 


PAPI, mpiP, imb 


Whamcloud available separately 

















In Conclusion ar È Sd 


SUSE delivers an HPC platform that P » 
makes those solutions better 
(easier to develop, faster and more 

manageable) 


Businesses around the world today 

are recognizing that a Linux-based w 
HPC infrastructure is vital to 

supporting the analytics 

applications of tomorrow 


HPC is not just for scientific research 
any longer, and is being adopted 
across banking, healthcare, utilities 
and manufacturing 
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Ja i ns 
"I don't want to be the 9 
embarrassment of the - 
galaxy, to have had the pe 
power to deflect an 
asteroid, and then not, 
and end up going 
extinct. We’d be the e 
laughing-stock of the 
aliens of the cosmos if 
that were the case.” 


Neil deGrasse Tyson, astrophysicist 6 
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For more information, contact SUSE at: 
+1 800 796 3700 (U.S./Canada) 


+49 (0)91-740 53-0 (Worldwide) 
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Other AI/HPC 
Projects 


FUSeML ( ) aims 
to provide an MLOps framework that 
integrates the AI/ML tools of your choice. 


Phoebe ( 
adds basic artificial intelligence 
capabilities to the Linux OS. 


Nearly 10,000 GitHub projects for HPC 
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Reference Architecture 


SUSE Linux Enterprise High Performance Computing 
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Compute Cluster 


Compute Node | Compute Node Compute Node 


Head Node 


Compute Node Compute Node Compute Node 


Compute Node Failover Head 
Compute Node Node 
Compute Node 


Compute Node Administration 


Compute Node 


Compute Node Compute Node 


Compute Node Compute Node 


Compute Node 


Compute Node 


Compute Node Compute Node 


Compute Node Compute Node 
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Login Node Login Node Login Node 





Message Passing Interface 
| 
Process Launching and Monitoring 


High Bandwidth Fabric 
High Performance Storage 






irene |] 


Lustre Spectrum 
Scale (GPFS) 
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Slurm workload management 
Memory allocation 
Parallel task executor 
Parallel remote commands 
Power management for clusters 
Monitor kernel RAS 
File system library 
Console access 
CPU identification 
Remote shell programs 
Portable Hardware Locality 


Authentication service 


User environment management 


Terminal management 


HPC Module 





